07. Dealing with Missingness

Lesson 3 21 Dealing With Missingness

Dealing with missingness

Readings on model based imputations:

  1. You can review this link on MICE imputation
  2. Maximum likelihood estimation is another common statistical tool used to impute missing values

  3. Check out this link on Multiple Imputation

  4. Listwise deletions:
    You eliminate the entire observation even when a single feature is missing. I am personally not a fan of this type of handling because of how dramatically it reduces the number of observations.

  5. Pairwise deletions:
    You eliminate just the features that are missing. When you are running correlation matrices, for example, you can keep complete pairs, and you do not have to eliminate entire records.

  6. Mean and Mode substitution:
    For missing features, you “fill in” with either the mean or mode of that entire feature. Use your discretion on what substitution will most appropriately reflect the population distribution of the feature.
  7. Check the distribution of the missingness in the data. You can run a distribution on missingness to check for how much of your data is missing. See if it follows a pattern? How does missingness in one feature relate to the missingness in another?